AD-A212  854 


I 


Systems 

Optimization 

Laboratory 


Department  of  Operations  Research 
Stanford  University 
Stanford,  CA  94305 


■J  -  on 4  ftiti  Mf 

ius "i.tci  Sfi  Kr..aat*d, 


89  9  27  06r 


SYSTEMS  OPTIMIZATION  LABORATORY 
DEPARTMENT  OF  OPERATIONS  RESEARCH 
STANFORD  UNIVERSITY 
STANFORD.  CALIFORNIA  91305-4022 


Monte  Carlo  (Importance)  Sampling  within  a  Benders' 
Decomposition  Algorithm  for  Stochastic  Linear  Programs 

by 

Gerd  Infanger 

TECHNICAL  REPORT  SOL  89-13 
September  1989 


Research  and  reproduction  of  this  report  were  partially  supported  by  the  U  S.  Department,  of  Energy 
Grant  DE-FG03-87ER25028;  Office  of  Naval  Research  Contract.  N000L1-89-J- 1639;  the  National 
Science  Foundation  Grants  DDM-88 14253,  D MS-89 13089  and  the  Austrian  Science  Foundation. 
Tends  zur  Forderung  der  vvissenschaftlichen  Forschung,"  Grant  J0323  -  Phy. 

Any  opinions,  findings,  and  conclusions  or  recommendations  expressed  in  this  publication  are  those 
of  the  author  and  do  NOT  necessarily  reflect  the  views  of  the  above  sponsois. 


Reproduction  in  whole  or  in  part  is  permitted  for  any  purposes  of  the  United  States  Government 
I  his  document  has  been  approved  for  public  release  and  sale;  its  distribution  is  unlimited. 


Monte  Carlo  (Importance)  Sampling  within  a  Benders' 
Decomposition  Algorithm  for  Stochastic  Linear  Programs 


Gerd  Infanger 
Stanford  University 


Abstract 


A  method  employing  decomposition  techniques  and  Monte 
Carlo  sampling  (importance  sampling)  to  solve  stochastic 
linear  programs  is  described  and  applied  to  capacity 
expansion  planning  problems  of  electric  utilities.  We 
consider  uncertain  availability  of  generators  and 
transmission  lines  and  uncertain  demand.  Numerical 
results  are  presented. 
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Introduction 


A  stochastic  linear  program  is  a  linear  program  whose 
parameters  (coefficients,  right  hand  sides)  are  uncertain.  The 
uncertain  parameters  are  assumed  to  be  known  only  by  their 
distributions.  That  means  that  the  values  of  some  functions  are 
numerical  characteristics  of  random  phenomena,  e.g.  mathematical 
expectations  of  functions  dependent  on  decision  variables  and 
random  parameters  (Kali.  et.  al.  1988). 

Suppose  a  function  z  =  E  C(V)  is  an  expectation  of  a 
function  C(V^;,  <5  e  12.  V  is  a  random  parameter  which  has 
outcomes  .  12  is  the  set  of  all  possible  random  events.  It  can 
be  finite,  infinite,  discrete  or  continuous.  In  the  continuous 
case  the  computation  of  the  expected  value  requires  to  solve  the 
integral : 


E  C(V) 


J  C(V‘S)P(d6) 


with  P  being  the  probability  measure. 


In  a  general  case  V  would  consist  of  several  components, 

r 

e.g.  V  =  (V1,....,V^1)  with  outcomes  V  wh '  we  also  will  denote 
by  lower  case  letters,  e.g.  v  =  (vl,  .  .  .  .  "  and  p(V^)  alias 
p(v)  would  denote  the  corresponding  density  function.  In  this 
case  the  above  mentioned  integral  takes  the  form  of  a  multiple 
integral : 


v  rtv) 


C  (v)  p (v) dv 


1 


dv 


h 


In  the  case  of  12  being  discrete  and  finite  the  expectation 
can  be  computed  by  c.  multiple  cum: 
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E  C  ( V)  =  Z.  .  .  E  C  ( v)  p  ( v ) 
V1  vh 


The  main  difficulties  in  stochastic  linear  programming  deal 
with  the  evaluation  of  the  multiple  integral  or  the  multiple  sum. 
The  numerical  computation  of  the  expectation  requires  a  large 
number  of  function  evaluations  and  each  function  evaluation  means 
a  linear  program  to  be  solved.  Different  approaches  attack  this 
problem,  e.g.  Birge  and  Wallace  (1988),  Kali  et.  al.  (1988), 
Rockafellar  and  Wets  (1989)  and  others.  We  follow  the  concept  of 
Dantzig  et .  al.  (1989). 


Two  Stage  Stochastic  Linear  Program 

An  important  class  of  models  concerns  dynamic  linear 
programs.  Activities  initiated  at  time  t  have  coefficients  at 
time  t  and  t+1.  Deterministic  dynamic  linear  programs  appear  as 
staircase  problems.  The  simplest  staircase  problem  is  that  with 
two  stages:  X  denotes  the  first,  Y  the  second  stage  decision 
variables,  A,  b  represent  the  coefficients  and  right  hand  sides 
of  the  first  stage  constraints  and  D,  d  concern  the  second  period 
constraints  together  with  B  which  couples  the  two  periods,  c,  f 
are  the  objective  function  coefficients. 

In  the  deterministic  case  c,  f,  A,  b,  B,  D  ,d  are  known  with 
certainty  to  the  planner.  In  the  stochastic  case,  the  parameters 
of  the  second  stage  are  not  known  to  the  planner  at  time  t=l,  but 
will  be  known  at  time  t=2 .  At  time  t=l  only  the  distribution  of 
these  parameters  are  assumed  to  be  known.  The  second  stage 
parameters  can  be  seen  as  random  variables  which  get  certain 
outcomes  with  certain  probabilities.  We  denote  a  certain  outcome 
of  these  parameters  with  <5  and  the  corresponding  probability  with 
p(<5),  C  e  12  ,  the  set  of  possible  outcomes. 
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min  z  =  cX  +  E6 ( fY6) 

s/t  AX  =  b  (1) 

B5X  +  DY6  =  d6 

X,Y6  >  0,  Sen 


In  (1)  a  two  stage  staircase  problem  is  transformed  into  a 
two  stage  stochastic  linear  program  and  the  parameters  d^  and 
being  random  variables.  Given  the  two  stage  stochastic  linear 
program  one  wants  to  make  a  decision  X  which  is  feasible  for  all 
scenarios  and  has  the  minimum  expected  costs. 

We  consider  the  case  of  fi  being  discrete  and  finite,  e.g.  fi 
=  ( 1 ,  .  .  .  ,  K)  ,  the  parameter  6  takes  on  K  values.  Then  we  can 
formulate  an  equivalent  deterministic  problem  to  the  stochastic 
linear  problem.  This  is  tractable  if  K  is  small.  For  K=3  the 
deterministic  equivalent  problem  is  given  in  (2). 


min  2  =  cX  +  p1fY1+  p2fY2+ 
s/t  AX 

B1X  +  DY1 

B2X  +  DY2 

B3X 

X,  Y1,  Y2,  Y3 


p3fY3 

=  b  (2) 

=  d1 
=  d2 

+  DY3  =  d3 
>  0 


Two  stage  stochastic  linear  programs  were  first  studied  in 
Dantzig  (1955)  and  then  developed  by  many  authors.  The  method 
which  we  want  to  apply  here  is  using  Benders  (1962) 
decomposition.  Van  Slyke  and  Wets  (1969)  suggested  to  express  the 
impact  of  the  second  period  by  a  scalar  0  and  "cuts",  which  are 
necessary  conditions  to  the  problem  and  are  expressed  only  in 
terms  of  the  first  period  variables  X  and  9.  Benders 
decomposition  splits  the  original  problem  into  a  master  problem 
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and  a  subproblem  which  decomposes  into  a  series  of  independent 
subproblems,  one  according  to  each  defl.  The  master  problem,  the 
sub  problems  and  the  cuts  are  represented  in  (3),  (4)  and  (5). 


The  master  problem: 


min  zM  =  cX  +  0 


s/t 


AX 

G1X  +  a0 

x,  e 


=  b 

=  g1,  1  =  l,...,L 


>  o 


(3) 


The  sub  problems: 


min  Zo5  =  p5fY6 


s/t 


(4) 


P^tt6:  DY6  =  <i6  +  B^X 

Ys  >  0,  S  e  n,  e.g.  n  =  {1,2,3} 


where  p^7r^  is  the  optimal  dual  solution  of  subproblem  6. 
The  cuts: 


g  =  Z6  pSn6d6  =  E(n6dS)  (5) 

G  =  Z6  pSn6BS  =  E(nSB6) 

a  =  0  ...  feasibility  cut 
a  =  1  ...  optimality  cut 

Solving  the  master  problem  we  obtain  a  solution  X.  Given  X 
we  can  solve  K  subproblems  den  to  compute  a  cut.  The  cut  is  a 
lower  bound  on  the  expected  value  of  the  second  stage  costs 
represented  as  a  function  of  X.  Cuts  are  sequentially  added  to 
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the  master  problem  and  new  values  of  X  are  obtained  until  the 
optimality  criterion  is  met.  We  distinguish  between  two  types  of 
cuts,  feasibility  cuts  and  optimality  cuts.  The  first  refers  to 
infeasible  subproblems  for  a  given  X  and  the  latter  to  feasible 
and  optimum  subproblems,  given  X. 

If  the  expected  values  z,  G,  and  g  are  computed  exactly, 
that  if.,  by  evaluating  all  scenarios  S  e  n,  we  refer  to  it  as  the 
universe  case.  As  we  will  see  later  the  number  of  scenarios 
easily  gets  out  of  hand  and  it  is  not  always  possible  to  solve 
the  universe  case.  Therefore  methods  are  sought  which  guarantee  a 
satisfying  solution  without  solving  the  universe  case.  Employing 
Monte  Carlo  methods  seems  to  be  a  promising  approach. 


Monte  Carlo  Sampling 

Each  iteration  of  Benders'  decomposition  requires  the 
computation  of  expected  values  ,e.g.  the  subproblem  costs,  the 
coefficients  and  right  hand  sides  of  the  cuts.  For  each  outcome  <5 
c  n  a  linear  program  has  to  be  solved.  The  expected  value  of  e.g. 
the  subproblem  costs  is  denoted  by 


zs  =  E  C(V6)  =  E  fY5  ,  6  e  n. 

The  number  of  elements  of  n  is  determined  by  the 
dimensionality  of  the  stochastic  vector  V  =  (V-^ ,  .  .  .  ,  V^)  . 
Typically  the  dimension  h  of  V  is  quite  large.  E.g.  in  expansion 
planning  problems  of  electric  power  systems  a  component  of  V 
denotes  the  availability  of  a  type  of  generators  or  a  demand  of 
power  in  a  node  of  a  multi  area  supply  network  or  the 
availability  of  a  type  of  transmission  line  connecting  two  nodes. 
Consider  several  nodes  and  arcs  and  one  demand  and  some  options 
of  generators  in  each  node.  The  number  of  scenarios  K  in  the 
universe  case  gets  quickly  out  of  hand,  even  if  the  distribution 
of  each  component  of  V  is  determined  by  just  a  small  number  K1  of 
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discrete  points.  Suppose  e.g.  h  =  20  and  K1  =  5.  Then  the  total 
number  of  terms  in  the  expected  value  calculations  is  K  =  520  = 

IT 

10  ,  which  is  not  practically  solvable  by  direct  summation 

(Glynn  in  Dantzig  et  al.  (1989)).  Monte  Carlo  methods  are 
recommended  to  compute  multiple  integrals  or  multiple  sums  for  h 
large  (Davis  and  Rabinowitz  (1984)).  See  Hammersly  and  Handscomb 
(1964)  for  a  description  of  Monte  Carlo  sampling  techniques. 


Crude  Monte  Carlo 

Suppose  V^,  6  =  l,...,n  are  scenarios,  sampled  independently 
from  their  joint  probability  mass  function,  then  C^  =  C(V^)  are 
independent  random  variates  with  expectation  z.  (The  subscript  S 
is  suppressed  now  as  there  is  no  danger  of  confusion.) 


z  =  (1/n)  E  C5 
5=1 


(6) 


is  an  unbiased  estimator  of  z  and  its  variance 

a2z  =  a2/ n 
a2  =  var ( C ( V) ) . 


Thus  the  standard  error  is  decreasing  with  sample  size  n  by 
n~0.5.  -phe  convergence  rate  of  z  to  z  is  independent  of  the 
dimension  h  of  the  random  vector  V. 
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Importance  Sampling 
We  rewrite 


z  =  Ev  c(v) p(v)  =  Ev  c(v)p(v)q(v)/q(v) 


by  introducing  a  probability  mass  function  q(v) .  We  can  see 
q  as  a  probability  mass  function  of  a  random  vector  W,  therefore 


z  =  E  C(W)p(W)/q(W) 


and  we  obtain  a  new  estimator 


z  =  ( 1/n)  £  C(W<S)p(W<S)/q(W<5) 

6  =  1 


which  has  a  variance  of 


var(z)  =  ( l/n)  Ew(C(w) p(w)/q(w)  -  z)2  q(w) 

Choosing  q*(w)  =  C(w)p(w)/(EW  C(w)p(w))  would  lead  to  var(z) 
=  0,  that  means  we  could  get  a  perfect  estimate  of  the  multiple 
sum  just  by  one  single  observation.  However  this  is  practically 
useless,  since  to  sample  C.p/q  we  have  to  know  q  and  to  determine 
q  we  need  to  know  z  =  Ew  c(w)p(w),  which  we  eventually  want  to 
compute.  Nevertheless  this  result  helps  to  derive  some  heuristics 
of  how  to  choose  q:  It  should  be  approximately  proportional  to 
the  product  C(w)p(w)  and  have  a  form  which  can  be  integrated 
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theoretically.  For  the  theory  of  importance  sampling  we  refer  to 
Glynn  and  Iglehart  (1988)  and  Dantzig  and  Glynn  (1989a). 

P. Glynn  and  M.Nakayama  in  Dantzig  et.  al.  (1989)  developed 
an  importance  sampling  scheme  using  an  additive  model  to 
approximate  the  cost  function  E  C(V): 

h 

C ( v )  =  £  Ci(vi) . 
i  =  l 

Actually  C(v)  is  achieved  by  a  marginal  cost  model, 
considering  marginal  costs  in  each  dimension  i  of  V. 


C(v)  =  C ( r )  +  s  Mi(vi) 
i  =  l 


(7) 


Mi(vi)  =  C(r1, . . ,ri_1,vi,ri+1, . . ,rd)  -C(r) 


r  =  ( 7 i >  • • • / rh)  can  be  anY  arbitrary  chosen  point  out  of  the 
set  of  values  v^,  i  =  l,..,h.  For  example  we  choose  r ^  that 
outcome  of  which  leads  to  the  respectively  lowest  costs.  In 
the  context  of  expansion  planning  of  power  systems  this  means 
selecting  respectively  lowest  demands  and  highest  availabilities 
of  generators  and  transmission  lines. 

Defining 


Mi  =  E  Mi(Vi) 


(8) 


and 
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F  ( v) 


((C(v) 


C(r)  )/ 


h 

2 

i=l 


Mi 


i(vi) 


(9) 


we  can  express  the  expected  value  of  the  costs  in  the 
following  form,  e.g.  in  the  case  of  h  =  3: 


z 


C(T)  + 

H1 

F(v) 

+ 

K2 

2v 

F(v) 

+ 

m3 

T 

V 

F  ( V) 

iPl(v1)M1(v1)/M1)p2 (v2)P3 (v3) 

Pl(vx) (P2(v2)M2(v2)/M2)P3(v3^ 

Pl(vl)P2(v2) (P3 (v3)M3 (v3)/R3) 

(10) 


Note  that  this  formulation  consists  of  a  constant  term  and  h 
expectations.  Given  a  fixed  sample  size  n  we  partition  n  into  n1, 
i  =  l,..h  sub-samples  ,  such  that  £  =  n  and  n^  >  1,  i  - 
l,...,n  and  n^  being  approximately  proportional  to  .  The  h 
expectations  are  separately  approximated  by  sampling  using 
marginal  densities.  The  i-th  expectation  corresponds  of  course  to 
the  i-th  component  of  V.  Generating  sample  points  in  the  i-th 
expectation  we  use  the  importance  density  (p^M^/M^)  for  sampling 
the  i-th  component  of  V  and  the  original  densities  for  any  other 
components.  Denoting 


=  (l/r^) 


ni 

£ 

j=l 


F  ( Vj  ) 


(11) 


the  estimate  of  the  i-th  sum,  we  obtain 
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h 

z  =  C ( r )  +  E  M iMi  ,  (12) 

i=l 


the  expected  value  of  the  second  stage  costs  C(V) . 

Let  o^2  be  the  sample  variance  of  the  i-th  expectation, where 
o^2  =  0  if  n^  =  1.  The  variance  of  the  mean  is  then  given  by 


2 


E  M i2ai2  /  nt 

i=l 


(13) 


Using  importance  sampling  one  can  achieve  significant 
variance  reduction.  The  experiment  of  M.  Nakayama  in  Dantzig  et 
al.  (1989)  claims  a  variance  reduction  of  1:20000  using 
importance  sampling  versus  crude  Monte  Carlo  sampling:  For  a 
given  and  optimal  X  the  second  stage  costs  of  a  multi  area 
expansion  planning  model  with  192  universe  scenarios  were  sampled 
with  a  sample  size  of  10  using  both  methods  and  the  results 
compared . 

The  derivation  above  concerned  the  expected  second  stage 
costs  z.  To  derive  a  cut  we  use  the  same  framework  analogously. 
Note  that  a  cut  is  defined  as  an  outer  linearization  of  the 
second  stage  costs  represented  as  a  function  of  X,  the  first 
stage  variable^.  At  X,  the  value  of  the  cut  is  exactly  the 
expected  seer,-  stage  costs  z.  Therefore  we  can  employ  directly 
the  cost  app1  -..^mation  scheme  and  the  importance  distribution  to 
compute  the  par'.  ers  of  a  cut.  We  define 
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h 

FG(v)  =  ((7TB)  (v)  -  (JTB)(r))/  2  Mi(vi)  (14) 

i  =  l 


h 

Fg(v)  =  ( (*rd)  (v)  -  (7rd )  ( r )  )/  Z  Mi(vi)  (15) 

i=l 

and  get  e.g.  in  the  case  of  h  =  3 

G  =  (7TB)  (r)  +  Mx  Ev  FG(v)  (Pl  (vx)  Mx  (vL)  /M^  p2  (v2  )  p3  ( v3  ) 

+  M2  Zv  Fg(v)  p1(v1) (p2 (v2)M2 (v2)/M2)p3 (v3) 

+  M3  2v  Fg(v)  Px (v1)P2 (v2) (p3 (v3)M3 (v3)/M3) 

(16) 

g  =  (7rd)(r)  +  Sv  Fg(v)  (Pj_  (v1)M1  (Vj^)^)  p2  (v2)  p3  (v3) 

+  M2  Zv  Fg(v)  Pj^Vj.)  (p2  (v2)M2  (v2)/M2)p3  (v3) 

+  M3  Zv  Fg(v)  P1(v1)P2(v2) (p3(v3)M3(v3)/M3) 

(17) 

the  coefficients  and  the  right  hand  side  of  a  cut.  We 
compute  the  expected  values  again  by  sampling  using  the  same 
sample  points  as  at  hand  from  the  computation  of  z. 

Using  Monte  Carlo  sampling  we  obtain  z  (a^) ,  G,  g,  which  are 
approximations  of  the  expected  values  z  =  E  c(V),  G  =  E  ( tt B )  (V)  , 
g  =  E  (7rd)  (V)  .  The  impact  of  using  approximations  instead  of  the 
exact  parameters  on  the  Benders  decomposition  algorithm  is 
analyzed  in  the  following  section. 
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Benders1  Decomposition 

In  the  following  we  will  derive  the  main  steps  of  Benders 
decomposition  algorithm  for  two  stage  stochastic  linear  programs 
considering  the  "universe"  case,  which  gives  the  exact  solution 
of  the  equivalent  deterministic  problem  ("certainty  equivalent"). 
We  will  then  analyze  the  impact  of  sampling  of  subproblems  on 
Benders  decomposition.  See  Geoffrion  (1970)  for  a  derivation  of 
Benders  decomposition  algorithm. 

Given  the  equivalent  deterministic  problem  in  (2)  and 
assuming  that  K  =  3  describes  the  universe  case,  we  rewrite  the 
problem  applying  projection  onto  the  X  variables  and  obtain  (18). 
We  assume  for  simplicity  that  (2)  is  feasible  and  has  a  finite 
optimum  solution. 


min  z  = 

cX  +  Inf[p1fY1+  p2  f Y2+  p3fY3]  (18) 

AX  =  b  DY1  =  d1  +  B3X 

X  >  0  DY2  =  d2  +  B2X 

DY3  =  d3  +  B3X 
Y1 ,  Y2,  Y3  >  0 

The  infimal  value  function  in  (18)  corresponds  to  the 
following  primal  linear  problem  (19): 


min  Zp  = 

pXf Y1  +  p2  f Y2  +  p3  f Y3 

=  E6 

(fY5) 

1  1 
p  : 

DY1 

=  d1 

+  B1X 

2  2 
pn  : 

DY2 

=  d2 

+  B2X 

3  3 
p  jr>: 

DY3 

=  d3 

+  b3x 

Y1,  Y2,  Y3  >  0 


and  to  the  dual  linear  problem  (20) : 
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max  zD  =  (20) 

p^n1  (d1+B1X)  +  p27T2  (d2  +  B2X)  +  p37T3  (d3+B3X) 

ir1D  =  f 

7T 2  D  =  f 

7T3D  =  f 

7 IT1,  7T2,  7T3  >  0 

The  primal  problem  is  parameterized  in  the  right  hand  side 
by  X.  The  assumption  (2)  being  finite  implies  that  (19)  is  finite 
for  at  least  one  value  of  X.  Applying  the  Duality  Theorem  of 
Linear  Programming  we  state  that  (20)  has  to  be  feasible.  The 
feasibility  conditions 

7r6D  -  f  =  0 

indicate  that  the  feasible  region  { tt 5 1 7r 6 D  -  f  <  0)  is 

independent  of  X  and  just  repeated  for  each  scenario  Sen. 

The  assumption  (2)  being  feasible  requires  feasibility  of 
the  primal  problem  (19)  for  at  least  one  X.  By  the  Duality 
Theorem  again  (20)  has  to  be  finite.  Let  ,  j=l,...,p  be  the 
extreme  points  and  ,  j=p+l, . . . ,p+q  be  representatives  of  the 
extreme  rays  of  the  feasible  region  of  (9)  .  Problem  (20)  is 
finite  if  and  only  if 

7Tj  (d5  +  B5X)  <0,  j  =  p+1, - , p+q  (21) 

Sen 

We  append  constraints  (21)  to  problem  (18)  to  ensure  that 
the  problem  is  bounded. 
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We  call 

z^*  =  Max  (d^+B^X)  =  zDS(n6*)  (22) 

l<j<p 

/\ 

the  optimum  second  stage  costs  for  given  X  in  scenario  6. 
7 r^*  denotes  the  optimum  dual  variables  for  scenario  S  selected 
from  the  set  j=l,..,p.  It  is  clear  that 

zD*  =  2  zD6*  (23) 

Sen 


bet  7r1*,7r2*,7r3*  be  the  vector  of  optimum  dual  variables  of 

✓\  ,  ,  t 

the  second  stage  problem,  given  X,  we  outer  linearize  the  infimal 
value  function  in  (18).  By  the  dual  problem  (19)  we  obtain: 


zD*  =  p1^1*  (d1+B1X)  +  p27r2*(d2+B2X)  +  p3?r3*  (d3+B3X) 

(24) 


and 


ZU(7T1*,7r2*,7T3*,X)  >  zL'(7r-L",7r';",7rJ",X) 


1*  „2*  -3* 


(25) 


formulates  the  main  property  of  the  outer  linearization. 
This  property  enables  us  to  rewrite  problem  (18)  by  expressing 
the  infimal  value  function  by  the  outer  linearized  dual  problem. 


min  z 
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zD(X)  = 

Max  pV  •  1(d1+B1X)  +  p27r-j2  (d2+B2X)  +  p3^ 3  (d3+B3X) 

l<j<P,  J  J 


Using  0  as  the  greatest  lower  bound  the  problem  can  be 
represented  in  the  following  form: 

min  z  =  cX  +  0  (27) 

AX  =  b 

X  >0 

0  >  p1^  1  (d1+B1X)  +  p27Tj2  (d2+B2X)  +  p37Tj3  (d3+B3X) 
j  —  1 /••••<  p 

7T  j  ( d*5  +  B^X)  <0,  j  =  p-rl ,  .  .  .  .  ,p+q 

S  e  n 


Relaxation  is  applied  to  solve  problem  (12). 

To  test  a  solution  (X,  0)  one  solves  problem  (19)  or  problem 
(20) ,  actually  by  solving 
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6*  ■  S 

zs°  =  mm  Zp°  = 

fY6 

s/t  7 r^: 

DY6 

=  d6  + 

Y<5 

IV 

o 

or  by  solving 

S*  S 

zs°  =  max  zD°  = 

7T5  (d 

<S+B(SX) 

ttSD 


7T 


=  f 
>  0, 


Sen 


Sen 


(28) 


(29) 


If  primal  infeasibility  or  dual  unboundness  is  detected,  a 
feasibility  cut 


irs  (d6  +  BSX)  <  0 


(30) 


is  added  to  the  master  problem.  If  all  primal  problems  are 
feasible  or  all  dual  problems  unbounded  an  optimality  cut 

0  >  2  pS7r6  (d6+BSX)  (31) 

Sen 


is  added  to  the  master  problem. 

In  the  1-th  iteration 

z1  =  Zj^1*  =  cX1*  +  e1*  (32) 

is  defined  to  be  a  lower  bound  and 


18 


z3-  =  min{z^  cX-*-*  +  zg*},  z°  =  ^  (33) 

to  be  an  upper  bound  to  the  solution  of  the  problem.  If 

(z  -  z )  /  z  <  TOL, 

where  TOL  is  a  given  tolerance,  the  problem  is  said  to  be 
solved  with  a  sufficient  accuracy. 


Probabilistic  Cuts 

Employing  Monte  Carlo  sampling  techniques  means  not  to 
solve  all  problems  Sen,  but  solving  problems  6  e  S ,  S  being  a 
subset  of  f2.  Instead  of  the  exact  expected  values  zs,  G,  g  we 
compute  the  estimates  zs,  G,  g  by  weighted  sums.  Suppose  e.g.  in 
problem  (2)  S  is  the  set  of  problems  {1,2}  out  of  fi  =  {1,2,3}. 
For  example  in  the  case  of  crude  Monte  Carlo  sampling  scenarios  <5 
e  S  are  sampled  according  to  the  probability  mass  function  p^  and 
an  approximation  of  the  expected  value  is  obtained  by  computing 
the  mean  of  the  samples,  e.g.  zs  =  (z3-  +  z2)/2.  Referring  to  (2), 
a  cut  obtained  by  crude  Monte  Carlo  sampling  would  be  computed  as 

6>  (P1/ (p1+p2) ) (d1+B1X)  +  (p2/ (p1+p2) ) (d2+B2X) - 

Suppose  0  is  the  approximation  of  6,  the  exact  outer 
linearization  of  the  second  stage  costs.  The  difference  0-9 
describes  the  error  of  the  approximation.  Comparing  the  primal 
(19)  and  the  dual  (20)  problem  for  n  and  S  we  see 

9  -  ©(p1  +  p2)  =  zp3* 
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The  error  (0-0)  =  Zp3*  -  p30  is  constant  with  respect  to  X 
and  affects  therefore  just  the  right  hand  side  of  a  cut.  See 
also  Dantzig  and  Glynn  (1989b)  in  this  respect.  In  general  S  is  a 
sufficient  large  subset  of  f2,  0  is  computed  by  sampling  methods 
and  the  error  is  derived  from  the  variance  of  the  sample  mean  and 
assumed  to  be  small. 

Cuts  computed  by  samples  do  not  necessarily  meet  the 
condition  cf  outer  linearization  (25) .  Violating  this  condition  a 
cut  intersects  and  separates  parts  of  the  feasible  region  of  the 
second  stage  problem.  A  sampled  cut  )s  therefore  not  a  valid 
cut.  The  right  hand  sides  of  cuts  obtained  by  sampling  can  be 
seen  as  stochastic  parameters.  We  assume  normal  distributions 
defined  by  the  means  g3-  and  the  standard  deviations  <7~3.  We  know 
that  a -  =  a-y,  the  standard  deviation  of  the  second  stage  costs. 


Upper  and  Lower  Bounds 

For  random  right  hand  sides  g-  in  Benders'  master  also  the 
upper  and  lower  bounds  of  the  problem  are  probabilistic.  The 
standard  deviation  of  the  upper  bound,  a z3  is  given  by  a z3  =  azS1 
the  standard  deviation  of  the  subproblems  costs. 

To  determine  the  standard  deviation  of  the  lower  bound 
consider  the  master  problem  at  iteration  L: 


min 

ZM  “ 

cX  + 

0 

s/t 

: 

AX 

=  b 

7T1 : 

G1X 

+  10 

=  g1  (“z1) 

t rL: 

glx 

+  n0 

=  gL  (°zh) 

X,  0 


>  0  , 
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where  L  cuts  have  been  added  to  the  originally  relaxed 
master  problem.  The  right  hand  sides  are  independent  stochastic 
parameters,  assumed  normally  distributed.  We  assume  independence 
as  the  cuts  are  generated  from  independent  samples,  neglecting 
the  dependency  that  X-^,  1  =  1 ,  .  .  ,  L  are  weakly  connected  by  the 
Benders'  algorithm.  Under  these  assumptions  we  experimentally 
obtain  a  distribution  of  zM*  by  drawing  N  samples  j  =  1,..,N  from 
the  normal  distributions  of  g^.  Varying  the  right  hand  sides 
(g1,..,gL)j  ,  according  to  the  samples  j  =  1,..,N  and  solving 
the  master  problem  for  each  j  =  1,..,N  we  obtain  solutions  zM  j, 
j  =  1 , . . , N ,  which  determine  the  distribution  of  the  lower  bound. 
Assuming  normal  distribution,  we  compute  the  variance 


N 

o2zM  =  var(zM*)  =  ( 1/N-l }  £  (z  -  zM*)2  (35) 

j  =  l 


Solving  the  master  problem  N  times  to  obtain  an  estimate  of 
the  lower  bound  variance  is  very  expensive.  We  assume  that  the 
standard  deviations  oz1,  1=1,.., L  are  small  and  all  solutions 
zM* j  ,  j  =  1 ,  .  .  , N  have  the  same  basis.  Then  we  can  compute  the 
sample  points  from  the  dual  objective  function: 


j 


Z  Tr^g1,  -  g1) 
1  =  1  J 


N 

(1/N-l)  Z  (A-j)2 

j  =  l 


(36) 


(37) 
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or  directly: 


a 


2 


zM 


S  7J-  2  a  L2 
1=1  1  Z 


(38) 


Stopping  Rule 

The  accuracy  of  the  solution  is  influenced  by  the  accuracy 
of  the  cuts  determined  by  the  variances  £  and  is  a  function  of 
the  sample  size  n.  Given  a  fixed  sample  size  n,  the  Benders 
algorithm 

has  to  proceed  until  the  maximum  accuracy  is  achieved. 
Operating  with  probabilistic  bounds  the  maximum  accuracy  is 
reached  if  the  upper  and  lower  bounds  are  identical  in 
distribution.  See  Dantzig  and  Glynn  (1989b)  for  an  appropriate 
rule  to  determine  this  property.  At  least  no  improvement  is 
possible  anymore  if  the  upper  and  the  lower  bound  intersect. 
Employing  this  rule  we  thus  define 


z  -  z  <  0 


(39) 


as  the  stopping  criteria. 


The  Accuracy  of  the  Solution 

The  accuracy  of  the  solution  can  be  estimated  from  the 
distribution  of  the  lower  bound  of  the  problem  after  the  last 
iteration.  Given  z ^  and  oz^,  in  the  last  iteration,  we  compute  a 
confidence  interval  e.g. 
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2  c  9  5  Z0  95  a 2  1 

Zq  g5  being  the  95%  quantile  of  the  standard  normal 
distribution . 

If 

zc95/2M*  -  T0L'  (40) 

the  obtained  solution  is  satisfying.  The  quality  criterion 
TOL  is  understood  here  as  a  confidence  interval.  Otherwise  the 
sample  size  has  to  be  increased  and  the  problem  has  to  be  solved 
again  with  the  increased  sample  size. 


Improvement  of  the  Solution 

Suppose  the  solution  with  a  certain  sample  size  was  not 
satisfying.  Instead  of  starting  from  the  beginning  with  an 
increased  sample  size  we  want  to  use  the  information,  which  we 
have  already  collected.  To  do  this,  we  look  for  the  binding  cuts 
in  the  final  solution,  increase  the  sample  size  and  recompute  the 

.  .  A*l 

binding  cuts  at  the  same  Xx,  they  were  originally  computed.  The 
enlarged  sample  size  leads  to  smaller  variances  a  ^  of  the 
binding  cuts  and  eventually  to  a  smaller  confidence  interval  of 
the  final  solution.  Solving  the  master  problem  again  with  the 
improved  binding  cuts  will  in  general  not  result  in  an 
intersection  of  the  lower  and  upper  bound.  Therefore  some  more 
iterations  are  necessary  to  obtain  the  optimal  solution  according 
to  the  increased  sample  size.  This  improvement  procedure  is 
employed  iteratively  until  a  satisfying  solution  is  obtained.  We 
can  state  now  the  algorithm  as  follows: 
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The  Algorithm 


Step  0  Initialize 

1  =  0,  z°  =  ?- 

Step  1  Solve  the  relaxed  master  problem  and  obtain  a 
lower  bond:  zr~  =  cX^  +  9^ 

Step  2  1=  1  +  1 

Solve  subproblems  and  obtain  an 
upper  bound:  z-i  =  min{z1_^,  cX-^  +  zs  }  , 
compute  and  add  a  cut  to  the  master  problem, 
using  Monte  Carlo  (importance)  sampling. 

Step  3  Solve  the  master  problem  and  obtain  a 
lower  bond:  z^  =  cX^  +  6^ 

Step  4  If  (z  -  z  <  0)  go  to  step  2 

Step  5  Compute  confidence  interval  zcg5  and  obtain  a 
solution:  z,  X,  0 

Step  6  If  ( zC95/zm*  -  T0L)  stop,  otherwise  go  to  step  7 

Step  7  Increase  sample  size  and 
initialize  zD  = 

Step  8  Recompute  binding  cuts 

Upper  bound:  Zj_  =  min{Zj_1,  cX-^  +  zs  )  , 

Step  9  Go  to  step  3 


Numerical  Results 

The  method  has  been  implemented  in  an  APL  environment  and 
tested  on  small  problems: 

The  test  problem  consists  of  a  capacity  expansion  planning 
problem  of  electric  utilities.  There  are  two  types  of  generators 
with  different  investment  and  operations  costs,  which  can  be 
built  and  operated  in  a  way  to  meet  the  demand,  given  by  a  load 
duration  curve  of  three  load  levels:  base,  medium  and  peak  load. 
Both  the  availability  of  power  from  the  generators  and  the  loads 
of  the  three  load  levels  are  considered  to  be  uncertain.  We 
assume  a  discrete  distribution  of  four  outcomes  of  the  first, 
five  outcomes  of  the  second  generator  and  four  outcomes  of  each 
of  the  demand.  The  model  is  formulated  as  a  complete  recourse 
model,  that  means  we  ensure  feasibility  of  the  subproblems  for 
any  X.  "Unserved  demand"  can  be  purchased  with  costs,  higher  than 
the  costs  of  production  (penalty  costs) .  The  generators  have 
intersecting  costs  functions  so  that  building  of  both  generators 
is  reasonable.  The  building  of  the  generators  is  in  competition 
with  the  purchase  of  load. 

The  assumptions  on  the  stochastic  variables  imply  1280 
possible  outcomes,  i.e.  1280  subproblems  have  to  be  solved  in 
each  iteration  of  the  Benders  decomposition  in  the  universe  case. 
We  compare  the  universe  solution  (the  test  problem  is  small 
enough  to  solve  the  universe  problem)  with  solutions  gained  by 
the  importance  sampling  algorithm. 

Table  1  shows  the  results  in  the  case  of  20  samples  out  of 
the  possible  1280  combinations  and  without  an  improvement  phase. 
100  replications  of  the  same  experiment  were  run  to  get 
statistical  information  about  the  accuracy  of  the  solution  and 
the  estimate  of  the  accuracy  of  the  solution. 

The  mean  of  the  objective  function  value  (total  costs) 
differs  from  the  universe  solution  by  0.2%.  From  the  distribution 
of  the  optimum  objective  function  value  given  by  the  100 
replications  a  95%  confidence  interval  can  be  computed:  plus 
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minus  1.5%.  A  95%  confidence  interval  of  each  solution  of  the  100 
replications  is  estimated.  The  mean  of  all  confidence  intervals 
is  1.4%,  which  is  a  slight  underestimation  of  the  true  confidence 
interval . Th i s  error  is  caused  by  the  estimation  method.  The 
estimation  method  however  improves  in  accuracy  with  decreasing 
confidence  intervals  of  the  solution.  The  coverage  rate  of  92% 
expresses  that  in  92%  of  the  100  replications  the  correct  answer 
of  the  universe  solution  is  covered  by  the  estimated  confidence 
interval.  This  again  shows  that  we  are  slightly  underestimating: 
if  the  computation  the  95%  confidence  interval  was  exact,  we 
would  expect  a  coverage  rate  of  95%. 

The  bias  and  the  confidence  interval  of  the  optimum 
strategies  (the  loads  X  to  be  installed)  are  larger  than  those  of 
the  optimum  objective  function  value.  The  optimum  seems  to  be 
flat:  several  different  strategies  lead  close  to  the  optimum 
costs.  Confidence  intervals  of  52%  and  48%  are  computed. 

In  the  above  example  a  sample  size  of  20  samples  was  chosen. 
Additional  computational  effort  is  also  needed  to  obtain  the 
importance  distribution,  e.g.  17  subproblems  have  to  be  solved  in 
each  iteration  to  obtain  the  marginal  costs  .  Compared  to  the 
universe  solution  the  method  e.g.  achieves  with  2.9%  of 
computational  effort  a  solution  which  is  with  95%  confidence 
within  an  interval  of  plus  minus  1.5%  of  the  correct  answer. 

Table  2  represents  computational  results  for  an  enlarged 
model:  We  consider  a  triangular  network  of  tree  nodes  connected 
by  transmission  lines.  There  is  a  load  in  each  node  and  one  type 
of  generator  in  two  of  the  nodes.  The  generators  in  the  two  nodes 
and  the  transmission  lines  have  to  be  built  to  serve  the  demand. 
The  generators  and  the  transmission  lines  respectively  differ  in 
investment  and  operation  costs.  We  consider  two  time  periods  (in 
the  sense  of  "Here  and  Now"  decision  making,  and  independent  in 
the  uncertain  parameters) ,  dependent  in  the  deterministic  part  of 
the  problem.  Thus  there  are  10  decision  variables  in  the  first 
stage  (master)  problem.  The  distributions  of  the  uncertain 
parameters  are  identical  to  the  first  test  problem.  The  problem 


is  solved  with  20  and  40  samples  out  of  1280  universe  scenarios 
using  importance  sampling  and  the  results  are  compared.  The 
statistical  information  is  obtained  by  respectively  30 
replications  of  the  same  experiment. 

One  can  see  decreasing  bias,  decreasing  confidence  intervals 
and  improving  estimations  of  the  confidence  intervals  with 
increased  sample  size.  A  larger  number  of  binding  cuts  (e.g. 
depending  on  the  number  of  first  stage  variables  and  thus  the 
problem  size)  in  the  final  solution  implies  more  error.  Therefore 
a  higher  sample  size  is  required  to  obtain  the  same  accuracy  of 
the  solution,  compared  to  smaller  problems. 


Table  1:  20  samples 


#iter 

correct 

mean 

7.9 

95%conf 

% 

bias 

% 

G1 

1800.0 

1568 . 8 

52 . 0 

-12.8 

G2 

1571.4 

1793.9 

47 . 7 

14 . 2 

theta 

13513.7 

13922 . 8 

15.9 

3 . 0 

obj 

24642 . 3 

24682.7 

1.5 

0.2 

est.  conf  1.4 

coverage  0.92 
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Table  2o  •  20  samples 


correct 


mean 


95%conf  bias 
%  % 


# iter  17 

period  1  G1  2559 

G2  5082 

T1  1369 

T2  1119 

T3  0 

thetal  13336 

period2  G1  1889 

G2  7624 

T1  2787 

T2  1801 

T3  0 

theta2  28058 

obj  99325 


13 . 8 

1656 

116.0 

-35.3 

5577 

26.7 

9.7 

1860 

65.9 

35.8 

1331 

140.2 

19.0 

72 

435.2 

16072 

29.4 

20.5 

2352 

149.6 

24 . 5 

7759 

43 . 3 

1 . 8 

2307 

77.9 

-17 . 2 

1721 

71.0 

-4 . 4 

212 

386.5 

26712 

29 . 0 

-4 . 8 

101199 

3.7 

1.9 

est.  conf 
coverage 


2.4 

0.6 
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Table  2b:  40  samples 


correct 


#iter  17 

period  1  G1  2559 

G2  5082 

T1  1369 

T2  1119 

T3  0 

thetal  13336 

period2  G1  1889 

G2  7624 

T1  2787 

T2  1801 

T3  0 

theta2  28058 

obj  99325 


est.  conf 
coverage 


mean  95%conf  bias 

%  % 


14.7 

1189 

58 . 0 

-53 . 5 

5810 

17 . 3 

15.7 

1744 

55.6 

27.4 

1290 

49 . 1 

15.2 

213 

710.2 

17123 

10.7 

28 . 4 

1445 

102 . 5 

-23.5 

8670 

17 . 8 

13 . 7 

3102 

44 . 7 

11.3 

1758 

50.8 

-2 . 4 

226 

395.0 

27196 

16.8 

-3 . 1 

100347 

2.3 

1.0 

2.1 

0.7 
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